Goto

Collaborating Authors

 Loans


The Role of Causal Features in Strategic Classification for Robustness and Alignment

arXiv.org Machine Learning

AsInstrategic classification, aninstitution(e.g., a bank) anticipates adaptation from userswe develop better algorithms under varying assumpwho change their features to increase utilitytions about adaptation (Levanon and Rosenfeld, 2022; in a classification task (e.g., loan repayment). Kleinberg and Raghavan, 2018), there are growing Since a key challenge is the distribution shiftconcerns about negative social impact on the agents who adapt to these systems, whether outcomes areinduced by users, we turn to causal models, which have been shown to bound the worst-static (Milli et al., 2019) or dynamic (G ois et al., case out-of-distribution (OOD) risk, and es-2025). When agents adapt, depending on the untablish several new results that link causal-derlying causal model (Horowitz and Rosenfeld, 2018; ity and strategic classification. First, we Miller et al., 2020), some changes improve agent outcomes while others constitute gaming the classifier,show that causal classification leads to optimal classification error after any sufficientlyworsening classification error. In this paper, we study large adaptation, when the noise is boundedwhether classifiers can maintain accuracy without sacin a certain way. Second, when these as-rificing alignment with predicted agent's goals.


Causal Algorithmic Recourse: Foundations and Methods

arXiv.org Machine Learning

The trustworthiness of AI decision-making systems is increasingly important. A key feature of such systems is the ability to provide recommendations for how an individual may reverse a negative decision, a problem known as algorithmic recourse. Existing approaches treat recourse outcomes as counterfactuals of a fixed unit, ignoring that real-world recourse involves repeated decisions on the same individual under possibly different latent conditions. We develop a causal framework that models recourse as a process over pre- and post-intervention outcomes, allowing for partial stability and resampling of latent variables. We introduce post-recourse stability conditions that enable reasoning about recourse from observational data alone, and develop a copula-based algorithm for inferring the effects of recourse under these conditions. For settings where paired observations of the same individual before and after intervention are available (called recourse data), we develop methods for inferring copula parameters and performing goodness-of-fit testing. When the copula model is rejected, we provide a distribution-free algorithm for learning recourse effects directly from recourse data. We demonstrate the value of the proposed methods on real and semi-synthetic datasets.


Neural Pseudo-Label Optimism for the Bank Loan Problem

Neural Information Processing Systems

We study a class of classification problems best exemplified by the bank loan problem, where a lender decides whether or not to issue a loan. The lender only observes whether a customer will repay a loan if the loan is issued to begin with, and thus modeled decisions affect what data is available to the lender for future decisions. As a result, it is possible for the lender's algorithm to "get stuck" with a self-fulfilling model. This model never corrects its false negatives, since it never sees the true label for rejected data, thus accumulating infinite regret. In the case of linear models, this issue can be addressed by adding optimism directly into the model predictions. However, there are few methods that extend to the function approximation case using Deep Neural Networks.



HowDoFairDecisionsFare inLong-termQualification?

Neural Information Processing Systems

We examine whether these static fairness constraints mitigate or worsen the qualification disparity in the long-run. Our work can be applied to a variety of applications such as recruitment and bank lending. In these applications, aninstitute observesindividuals' features (e.g., credit scores), and makes myopic decisions(e.g., issue loans) by assessing such features against some variables of interest (e.g., ability torepay) which are unknown and unobservable tothe institute when making decisions.




Incorporating data drift to perform survival analysis on credit risk

arXiv.org Machine Learning

Survival analysis has become a standard approach for modelling time to default by time-varying covariates in credit risk. Unlike most existing methods that implicitly assume a stationary data-generating process, in practise, mortgage portfolios are exposed to various forms of data drift caused by changing borrower behaviour, macroeconomic conditions, policy regimes and so on. This study investigates the impact of data drift on survival-based credit risk models and proposes a dynamic joint modelling framework to improve robustness under non-stationary environments. The proposed model integrates a longitudinal behavioural marker derived from balance dynamics with a discrete-time hazard formulation, combined with landmark one-hot encoding and isotonic calibration. Three types of data drift (sudden, incremental and recurring) are simulated and analysed on mortgage loan datasets from Freddie Mac. Experiments and corresponding evidence show that the proposed landmark-based joint model consistently outperforms classical survival models, tree-based drift-adaptive learners and gradient boosting methods in terms of discrimination and calibration across all drift scenarios, which confirms the superiority of our model design.


FSL-BDP: Federated Survival Learning with Bayesian Differential Privacy for Credit Risk Modeling

arXiv.org Machine Learning

Credit risk models are a critical decision-support tool for financial institutions, yet tightening data-protection rules (e.g., GDPR, CCPA) increasingly prohibit cross-border sharing of borrower data, even as these models benefit from cross-institution learning. Traditional default prediction suffers from two limitations: binary classification ignores default timing, treating early defaulters (high loss) equivalently to late defaulters (low loss), and centralized training violates emerging regulatory constraints. We propose a Federated Survival Learning framework with Bayesian Differential Privacy (FSL-BDP) that models time-to-default trajectories without centralizing sensitive data. The framework provides Bayesian (data-dependent) differential privacy (DP) guarantees while enabling institutions to jointly learn risk dynamics. Experiments on three real-world credit datasets (LendingClub, SBA, Bondora) show that federation fundamentally alters the relative effectiveness of privacy mechanisms. While classical DP performs better than Bayesian DP in centralized settings, the latter benefits substantially more from federation (+7.0\% vs +1.4\%), achieving near parity of non-private performance and outperforming classical DP in the majority of participating clients. This ranking reversal yields a key decision-support insight: privacy mechanism selection should be evaluated in the target deployment architecture, rather than centralized benchmarks. These findings provide actionable guidance for practitioners designing privacy-preserving decision support systems in regulated, multi-institutional environments.


Neural Pseudo-Label Optimism for the Bank Loan Problem

Neural Information Processing Systems

We study a class of classification problems best exemplified by the \emph{bank loan} problem, where a lender decides whether or not to issue a loan. The lender only observes whether a customer will repay a loan if the loan is issued to begin with, and thus modeled decisions affect what data is available to the lender for future decisions. As a result, it is possible for the lender's algorithm to ``get stuck'' with a self-fulfilling model. This model never corrects its false negatives, since it never sees the true label for rejected data, thus accumulating infinite regret. In the case of linear models, this issue can be addressed by adding optimism directly into the model predictions. However, there are few methods that extend to the function approximation case using Deep Neural Networks.